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Abstract 

Background: Our aim was to assess the diagnostic agreement among the neurologists in the Neurological Disorders in Central Spain 2 (NEDICES-2) study; these 
neurologists were assigning diagnoses of essential tremor (ET) vs. no ET. 

Methods: Clinical histories and standardized video-taped neurological examinations of 26 individuals (11 ET, seven Parkinson's disease, three diagnostic ally 
unclear, four normal, one with a tremor disorder other than ET) were provided to seven consultant neurologists, six neurology residents, and five neurology research 
fellows (18 neurologists total). For each of the 26 individuals, neurologists were asked to assign a diagnosis of "ET" or "no ET" using diagnostic criteria proposed by 
the Movement Disorders Society (MDS). Inter-rater agreement was assessed both with percent concordance and non-weighted k statistics. 

Results: Overall k was 0.61 (substantial agreement), with no differences between consultant neurologists (k=0.60), neurology residents (jc=0.61), and neurology 
research fellows (k=0.66) in subgroup analyses. Subanalyses of agreement only among those 15 subjects with a previous diagnosis of ET (1 1 patients) and those with 
a previous diagnosis of being normal (four individuals) showed an overall k of 0.51 (moderate agreement). 

Discussion: In a population-based epidemiological study, substantial agreement was demonstrated for the diagnosis of ET among neurologists of different levels of 
expertise. However, agreement was lower than that previously reported using the Washington Heights— In wood Genetic Study of Essential Tremor criteria, and a 
head-to-head comparison is needed to assess which is the tool of choice in epidemiological research in ET. 
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Introduction 

The ideal gold standard for the diagnosis of a disease is an easily 
identifiable pathological finding or, in the absence of this, a disease- 
specific biological marker. 1 The absence of biomarkers or diagnostic 
pathological findings for many neurological disorders adds uncertainty 
to their diagnosis. 2 

For this reason, the diagnosis of these neurological disorders relies 
on expert clinical assessment, using previously established diagnostic 
criteria.' 5 ' 4 Thus, it is critically important to determine the diagnostic 
agreement among experts. Inter-rater agreement in the diagnosis of 
essential tremor (ET) has been previously assessed by Louis et al. in a 
study of 226 subjects, which demonstrated a diagnostic concordance of 
80% and a weighted k statistic of 0.84 between two neurologists 
specializing in movement disorders who used the Washington 
Heights-Inwood Genetic Study of Essential Tremor (WHIGET) 
protocol and clinical criteria. 

The Neurological Disorders in Central Spain 2 (NEDICES-2) is a 
population-based, closed cohort study that will assess over 10,000 
subjects from several populations in central Spain; it will also include a 
biobank. All participants will be screened and, if necessary, assessed by 
a neurologist for the presence of several neurological conditions (i.e., 
Parkinson's disease, ET, mild cognitive impairment, dementia, 
transient ischemic attacks, stroke, headaches, sleep disorders, and 
oro-linguo-facial dyskinesia). Our aim here was to perform a reliability 
study among the participant neurologists with respect to the diagnosis 
of ET vs. no ET. 

Methods 

The NEDICES-2 is a population-based epidemiological study, 
which will include over 10,000 subjects aged 55 years and older from 
the regions of Madrid, Avila, Segovia, Burgos, and Salamanca. Face- 
to-face interviews will include a comprehensive questionnaire on 
demographics, current medications, medical conditions, and lifestyle 
habits; biological samples (blood, saliva, urine, and hair) will be 
obtained at baseline. Presently, the project is in the pilot study phase. 
The Clinical Research Ethics Committee of the Hospital 12 de 
Octubre Research Institute has approved the protocol of the 
NEDICES-2 study and its pilot study. 

The work was conducted at the University Hospital 1 2 de Octubre 
in Madrid (Spain), which is the tertiary care center coordinating the 
NEDICES-2 project. Twenty-six patients were selected from the 
database of the movement disorders clinic of this institution by an 
independent team of researchers (not involved in this agreement 
study); the patients had signed informed consent for the research use of 
their data. The patients were selected in an attempt to cover the wide 
spectrum of tremor presentations, including severe ET, mild, or 
moderate ET, unclear tremor diagnosis, and those with no tremor at 
all (normal). Among the selected patients, there were four individuals 
with a severe disabling postural and kinetic tremor and a diagnosis of 
ET ("severe ET" category; cases 1, 8, 9, and 12), seven individuals 
with previous diagnoses of mild to moderate ET ("mild/moderate ET" 
category; cases 3, 5, 10, 16, 17, 19, and 23), four individuals with a 



diagnosis of no tremor or physiological tremor and completely normal 
neurological examination ("normal" category; cases 4, 13, 21, and 22), 
seven patients with a diagnosis of Parkinson's disease ("PD" category; 
cases 7, 11, 14, 20, 24, 25, and 26), one subject with a diagnosis of 
another tremor different to ET ("other tremor" category; case 18), and 
three individuals that were considered a priori to be diagnostically 
unclear due to the presence of mild postural and intention tremor 
along with parkinsonian signs, such as hypomimia and mild 
bradykinesia ("ET/PD" category; cases 2, 6, and 15). These subjects 
did not have a definite diagnosis, and the differential included ET and 
Parkinson's disease. 

A questionnaire was mailed to seven consultant neurologists, six 
neurology residents, and five neurology research fellows (18 neurol- 
ogists in total) who worked at the Department of Neurology. They 
were provided a history of the clinical presentation of the 26 subjects 
and a video-recording of a standardized neurological examination, 
including assessment of head, trunk, and upper limb tremor at rest, 
and during sustained arm extension, pouring water, drinking water, 
and finger-to-nose maneuver. The 18 neurologists were blinded to the 
diagnosis previously assigned by clinical neurologists with expertise in 
movement disorders, and independendy assessed the information and 
provided a diagnosis. The possible answers for each subject were "ET" 
or "no ET", assessed using the diagnostic criteria proposed by the 
Movement Disorders Society (MDS). 6 

Inter-rater agreement was assessed with concordance (i.e., percen- 
tage of 1 8 neurologists who agreed with the clinic-assigned diagnosis of 
"ET" or "no ET") and was also analyzed by means of a non-weighted 
K statistic for multiple raters with two possible outcomes (Stata 12, 
Stata Corp, College Station, TX). The K statistic takes chance 
agreement into account, whereas concordance does not. 7 K coefficients 
were graded as proposed by Landis and Koch: 8 0-0.2 (slight 
agreement), 0.21-0.4 (fair agreement), 0.41—0.6 (moderate agreement), 
0.61-0.8 (substantial agreement), and 0.81-1.0 (near perfect agree- 
ment). Subgroup analyses of inter-rater agreement were also 
performed depending on the expertise of the 18 neurologists 
(consultants, research fellows, and residents). 

Results 

Diagnosis of ET was made by 100% of raters in one subject (case 1 
with severe ET), and the diagnosis of "no ET" was made by 100% of 
raters in six subjects (case 4 [normal], 7, 11, 24, 25, 26 [with PD]) 
(Table 1). The percentage agreement for diagnostic categories "mild to 
moderate ET", "severe ET", and "ET/PD" was variable from case to 
case. Overall, the highest percentage agreement seemed to be achieved 
in the cases previously rated as "PD", "other tremor", and "normal". 

Overall k was 0.61 (95% CI 0.49-0.64), which is in the range of 
moderate to substantial agreement (Table 2). Subgroup analyses 
showed that re was 0.60 (95% CI 0.57-0.69) among consultant 
neurologists (moderate to substantial agreement), 0.66 (95% CI 0.52- 
0.78) among research fellows (moderate to substantial agreement), and 
0.61 (95% CI 0.49-0.67) among neurology residents (moderate to 
substantial agreement). Subanalyses of agreement only among those 15 
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Table 1. Overall and Subgroup Percent Agreement in the Diagnosis of Essential Tremor 



A Priori Diagnosis Case 

Numbers 



Overall 



Consultants 



Residents 



Research Fellows 



Mild/moderate ET 



Severe ET 




Other tremor 
Normal 





ET (%) 


No ET (%) 


ET (%) 


No ET (%) 


ET (%) 


No ET (%) 


ET (%) 


No ET ( 


3 


88.9 


1 I.I 


100.0 


0.0 


66.7 


33.3 


100.0 


0.0 


5 


38.9 


61.1 


71.4 


28.6 


0.0 


100.0 


40.0 


60.0 


10 


5.6 


94.4 


100.0 


0.0 


100.0 


0.0 


80.0 


20.0 


16 


111 


27.8 


85.7 


14.3 


50.0 


50.0 


80.0 


20.0 


17 


83.3 


16.7 


85.7 


14.3 


83.3 


16.7 


80.0 


20.0 


19 


88.9 


1 I.I 


100.0 


0.0 


83.3 


16.7 


80.0 


20.0 


23 


88.9 


1 I.I 


100.0 


0.0 


66.7 


33.3 


100.0 


0.0 


1 


100.0 


0.0 


100.0 


0.0 


100.0 


0.0 


100.0 


0.0 


8 


55.6 


44.4 


71.4 


28.6 


33.3 


66.7 


60.0 


40.0 


9 


94.4 


5.6 


100.0 


0.0 


100.0 


0.0 


80.0 


20.0 


12 


61.1 


38.9 


71.4 


28.6 


50.0 


50.0 


60.0 


40.0 


2 


88.9 


1 I.I 


100.0 


0.0 


66.7 


33.3 


100.0 


0.0 


6 


27.8 


72.2 


57.1 


42.9 


0.0 


100.0 


20.0 


80.0 


15 


1 I.I 


88.9 


28.6 


71.4 


0.0 


100.0 


0.0 


100.0 


7 


0.0 


100.0 


0.0 


100.0 


0.0 


100.0 


0.0 


100.0 


1 1 


0.0 


100.0 


0.0 


100.0 


0.0 


100.0 


0.0 


100.0 


14 


5.6 


94.4 


14.3 


85.7 


0.0 


100.0 


0.0 


100.0 


20 


5.6 


94.4 


14.3 


85.7 


0.0 


100.0 


0.0 


100.0 


2A 


A A 

u.u 


1 AA A 

1 uu.u 


A A 

u.u 


1 AA A 

1 uu.u 


A A 

u.u 


1 AA A 

1 UU.U 


A A 
U.U 


1 AA A 

1 UU.U 


25 


0.0 


100.0 


0.0 


100.0 


0.0 


100.0 


0.0 


100.0 


26 


0.0 


100.0 


0.0 


100.0 


0.0 


100.0 


0.0 


100.0 


18 


16.7 


83.3 


28.6 


71.4 


16.7 


83.3 


0.0 


100.0 


4 


0.0 


100.0 


0.0 


100.0 


0.0 


100.0 


0.0 


100.0 


13 


5.6 


94.4 


14.3 


85.7 


0.0 


100.0 


0.0 


100.0 


21 


5.6 


94.4 


14.3 


85.7 


0.0 


100.0 


0.0 


100.0 


22 


1 I.I 


88.9 


28.6 


71.4 


0.0 


100.0 


0.0 


100.0 



Abbreviations: ET, Essential Tremor; PD, Parkinson's Disease. 



subjects with a previous diagnosis of ET (1 1 patients) and those with a 
previous diagnosis of being normal (4 individuals) showed an overall K 
of 0.51 (95% CI 0.44-0.66, Z = 24.56, p<0.001), and subgroup 
analyses showed K=0.52 among neurologists (95% CI 0.42-0.55, 
Z = 9.24, p<0.001), fc=0.54 among residents (95% CI 0.41-0.65, 
Z = 8.06, p<0.001), and /c=0.48 among research fellows (95% CI 



0.39-0.65, Z = 5.91, p<0.001); these values were all in the range of 
moderate agreement. 

Discussion 

The goal of case identification in epidemiological research is to 
obtain a standardized diagnosis that is the most accurate possible 
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Table 2. Overall and Subgroup Diagnostic Agreement 



Raters 


N 


K 


95% CI 


z 


P 


Overall 


18 


0.61 


0.49-0.64 


38.4 


<0.00l 


Neurologists 




0.60 


0.57-0.69 


14.12 


<0.00l 


Residents 


6 


0.61 


0.49-0.67 


12.1 1 


<0.00l 


Research fellows 




0.66 


0.52-0.78 


10.58 


<0.00l 


Abbreviation: CI, Confidence Interval. 



within the constraints of the study design and available resources. 3 The 
basic tool in neurological diagnosis is expert examination. Even with 
the expertise of specialists, a definite diagnosis may not be possible in 
some cases during life. 

Misclassification of disease status in epidemiological research dilutes 
the true association between exposure and disease, when misclassifica- 
tion is random, and may falsely elevate the degree of association 
between an exposure and disease risk when there is systematic 
identification bias. For most studies of neurological diseases, routine 
clinical diagnosis by neurologists, often in conjunction with the use of 
standardized published diagnostic criteria, is the most practical method 
for case identification. Standardized diagnostic criteria are imperfect, 
but can help to ensure that various groups involved in research are in 
fact studying the same entity. However, routine clinical diagnosis 
depends on the expertise of the clinician and can be affected by 
differences in disease presentation and in the attitudes of physicians 
toward the diagnosis in different cultures. 

The current results, among researchers involved in the NEDICES-2 
study, indicate that the MDS consensus diagnostic criteria are a 
reliable set of criteria within the current framework. These results show 
an overall substantial agreement for the diagnosis of ET, which is 
similar among neurologists, research fellows, and neurology residents. 
Subanalyses limited to all severity of ET cases as well controls revealed 
an overall level of agreement that was lower but still remained in the 
moderate range. The MDS criteria were selected because of their 
simplicity and rapid application using data from the medical history 
and the physical examination of cases. We attempted to minimize the 
variability in the patient's medical records and examinations by 
reformatting the data into a standard case record format and a 
standardized physical examination, and we then required raters to 
classify cases into diagnostic groups using standardized diagnostic 
criteria. 9 WHIGET criteria have the benefit of recording a 
standardized neurological examination and assessing it by means of 
a previously validated score. J However, this scale has the disadvantage 
of having been validated only among experts in movement disorders. 
The present study has demonstrated an acceptable rate of agreement 
among non-specialists using a simpler diagnostic tool. The values of K 
are lower than that found in the agreement study by Louis et al.; 5 this 
could be a function of the different case mix and the different level of 
expertise of the neurologists in the two studies. It could also reflect the 
diagnostic tools that were used. 



This study has several limitations. Firstly, this was a reliability 
study, not a validity study. In the absence of biologic markers for ET 
(i.e., a diagnostic gold standard), the issue of validity becomes a 
difficult one to address.' Reliability becomes the only standard by 
which one can judge the quality of the observations. Secondly, while 
we assessed inter-rater agreement, we did not assess test-retest 
reliability. 1 Third, the use of video-taped examinations may add 
some concerns. However, Martinez-Martin et al. 1 1 showed that 
rating action tremors without the assistance of a teaching video-tape 
was characterized by only moderate levels of inter-rater agreement. 
On the other hand, the apparent amplitude of a tremor seen on a 
video-screen also depends on the distance of the observers from the 
screen and the size of the images, which is influenced by the amount 
of zoom used by the cameraman. 12 The accuracy of the video- 
recording for detecting tremor also depends on the rate of the 
movement, with information being lost the faster the tremor 
frequency, and, thus, there is a greater reduction in the apparent 
amplitudes of high- compared with low-frequency tremors. 12 Fourth, 
in terms of statistical tests, the K test can be quite sensitive. Inclusion 
of only a group of easy to diagnose cases biases the analysis towards a 
high agreement. An attempt to minimize this effect was made by 
selecting cases with different severities of tremor, subjects without 
tremor at all, and subjects with an unclear diagnosis. However, given 
the variety of diagnosis, the sample size in each category is probably 
lower than desirable. Therefore, the results of subgroup analyses must 
be interpreted with caution. Finally, we did not test different 
plausibility ratings for the diagnosis of ET (i.e., definite, probable, 
and possible). The distinction between normal and possible ET is still 
an area of some disagreement, with only moderate agreement 
between experts.'' 

In summary, we have demonstrated a substantial agreement among 
neurologists with different levels of expertise involved in a population- 
based epidemiological study of ET. However, agreement rates were 
lower than those previously reported using the WHIGET criteria, and 
a head-to-head comparison is needed to assess which is the tool of 
choice in epidemiological research in ET. A standardized training 
session on reliability, with the participation of the researchers to be 
involved in the clinical assessment of NEDICES-2 participants, would 
be necessary in order to increase the reliability of the diagnosis of ET if 
the MDS criteria are to be used. 
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